Goto

Collaborating Authors

 incorrect data


How Much of Your Data Can Suck? Thresholds for Domain Performance and Emergent Misalignment in LLMs

Ouyang, Jian, T, Arman, Jin, Ge

arXiv.org Artificial Intelligence

This paper investigates the impact of incorrect data on the performance and safety of large language models (LLMs), specifically gpt-4o, during supervised fine-tuning (SFT). Although LLMs become increasingly vital across broad domains like finance, coding, law, and health, fine-tuning on incorrect data can lead to "emergent misalignment," producing harmful or deceptive outputs unrelated to the intended task. We evaluate gpt-4o models fine-tuned with varying ratios (10\% to 90\% correct) of both obviously and subtly incorrect data across four domains: coding, finance, health, and legal. Our findings show that even modest amounts of incorrect data (10-25\%) dramatically degrade domain performance and not moral alignment. A clear threshold of at least 50\% correct data is needed for models to consistently recover strong performance, though they rarely match the robustness and safety of the base model, which exhibits near-perfect alignment and zero dangerous completions out-of-the-box. This research emphasizes that the cost of incorrect data is heavy, highlighting the critical need for extremely high-quality data curation or, alternatively, leveraging robust base models without unnecessary fine-tuning for high-stakes applications.


Your business can tame AI hallucinations with this data-driven approach

FOX News

Kara Frederick, tech director at the Heritage Foundation, discusses the need for regulations on artificial intelligence as lawmakers and tech titans discuss the potential risks. Picture this: you open up your favorite food delivery app to order a late-night snack. You select your go-to order and finalize your purchase. When your food comes, you find that they gave you ranch dressing to go with your cinnamon roll. You know for sure, you asked for extra icing on the side and you check back on the app to find you indeed asked for icing, and received ranch.


Sensor Validation Using Dynamic Belief Networks

Nicholson, Ann, Brady, J. M.

arXiv.org Artificial Intelligence

The trajectory of a robot is monitored in a restricted dynamic environment using light beam sensor data. We have a Dynamic Belief Network (DBN), based on a discrete model of the domain, which provides discrete monitoring analogous to conventional quantitative filter techniques. Sensor observations are added to the basic DBN in the form of specific evidence. However, sensor data is often partially or totally incorrect. We show how the basic DBN, which infers only an impossible combination of evidence, may be modified to handle specific types of incorrect data which may occur in the domain. We then present an extension to the DBN, the addition of an invalidating node, which models the status of the sensor as working or defective. This node provides a qualitative explanation of inconsistent data: it is caused by a defective sensor. The connection of successive instances of the invalidating node models the status of a sensor over time, allowing the DBN to handle both persistent and intermittent faults.